Neural Networks
Evaluation functions are often difficult to design.
Arthur Samuel, in his pioneering work in machine learning,
attempted to have his program discover through search its own checker
board evaluation function
[Samuel, 1959]. His work has proven
difficult to repeat, and today most game board evaluation functions such as
those in world chess champion Deep Blue
[Hamilton and Garber, 1997]
and world checkers champion
Chinook
[Schaeffer, 1997]
are still primarily designed by hand.
Gerald Tesauro, in his TD-Gammon backgammon program, trained a neural
network to do game board evaluations
[Tesauro, 1995]. His move
generator generates all possible moves from a given die roll, and
passes each resulting board position to the neural net for evaluation.
He trained the neural net by having it play against itself for several
days, modifying
the neural net weights using the temporal difference learning technique
(discussed below). So this system employed search both during development,
to find
the best set of weights for the neural net, and
during gameplay, to find the best move.
Neural Network Basics
Neural networks consist of an input layer, zero or more hidden layers,
and an output layer. The output of each neuron is computed by a weighed
sum of the inputs to the neuron. A non-linear function such as the
sigmoid function is also applied to the output. Training the network
consists of finding the set of weights for each neuron input that
cause it to produce the desired outputs when the input data set is
presented.
Neural network training is also a form
of gradient-descent search. The network is a function approximation
system which
tries to produce the desired output for each training input. The
objective of training is to minimize the error across the set of
all training
inputs. This is the same as finding the minima on a surface, where
the altitude is error and the other dimensions are values for the
network weights.
TD-Gammon
The TD-Gammon network used 198 input units, 40 hidden units, and 4
output units. The inputs encoded the positions of the pieces on the
board. The outputs represented the estimated probability of each
player winning from this position.
Temporal Difference Learning
At the beginning of training weights in neural networks are initialized
to random values. As each test input is presented the generated output
is compared to the desired output, and the weights are adjusted
accordingly. The technique used to assign weight changes across both
output and hidden layers is called back-propagation. This
technique requires a known correct output to compare to the generated
output for each input set. In an earlier neural network backgammon
program Tesauro actually estimated the probability of winning for many
intermediate backgammon board positions, and manually trained the
network. For TD-Gammon he instead used the method of temporal difference.
This method works much like a mathematical induction proof in reverse.
The outputs on turn t+1 are used as the correct values to train the network
for turn t. At the end of the game we know the desired output values,
and these values ripple back through time during the training.
More recently there has been some question about the contribution of
TD learning to the performance of TD-Gammon
[Pollack and Blair, 1996]. Tesauro does
discuss the unique nature of the game of backgammon and why it is more
suitable to neural network learning and evaluation. Major factors include
the use
of dice to distribute play over the fitness landscape, the
one-dimensional nature of the board, and the
fact that all games eventually terminate with a win for one player.
Pollack and Blair assert that starting from random weights, simple co-evolution
between a current champion and a slightly mutated challenger can
produce results comparable to TD-Gammon.
Update: At IJCAI-01 Jonathan Schaeffer announced he had successfully trained the
Chinook checkers evaluation function using temporal difference learning
[Schaeffer, Hlynka and Jussila, 2001].
TD-Gammon image based on
[Nilsson, 1998]
back
index
forward